PM2.5 concentrations in the Bay Area

The CalEnviroScreen PM2.5 indicator is defined as “Annual mean concentration of PM2.5 (weighted average of measured monitor concentrations and satellite observations, µg/m3), over three years (2015 to 2017).” From this map, it is clear that the highest concentrations of PM2.5 are directly around the bay, particularly along the East Bay. The areas on the north and south ends of the Bay Area counties have much lower PM2.5 air pollution than areas like Oakland, Berkeley, and Richmond.

Rates of Asthma in the Bay Area

To measure asthma, CalEnviroScreen uses data of “Spatially modeled, age-adjusted rate of emergency room visits for asthma per 10,000 (averaged over 2015-2017).” This map shows that there are particularly high rates of asthma in Richmond, Oakland, south of Alameda, and in the Eastern parts of Solono and Contra Costa counties. There is a stark difference between the high rates here and the relatively low rates of asthma in surrounding areas like San Francisco, San Mateo, Marin, and Sonoma counties. There are some small centers of asthma in cities like San Jose and the Bayview-Hunters Point neighborhood of SF.

Asthma hospitalizations as a function of PM2.5 concentrations in the Bay Area The best fit line does not appear to really describe what the data is doing. The data is all over the place and does not have a clear linear trend.

## 
## Call:
## lm(formula = Asthma ~ PM2.5, data = ces4_bay_data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -54.47 -25.89  -9.61  12.94 182.95 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -116.278     13.040  -8.917   <2e-16 ***
## PM2.5         19.862      1.534  12.950   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 37.49 on 1578 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.09606,    Adjusted R-squared:  0.09549 
## F-statistic: 167.7 on 1 and 1578 DF,  p-value: < 2.2e-16

The residuals are not symmetrically centered around zero. However, the p-values for the best fit line are significantly small, meaning that this result can be considered statistically significant. The R squared value is quite low, though, suggesting that PM2.5 cannot explain a lot of the variation in asthma.

An increase of 1 in PM2.5 is associated with an increase of 19.862 in Asthma; 9.54% of the variation in asthma is explained by the variation in PM2.5.

The mean of this distribution is not centered at zero (it appears closer to -25) and it is heavily skewed and asymmetrical. There is a high concentration of residuals between -50 and 25, and a long tail in the positive direction. This means our regression is not a very good representation of our data.

Asthma hospitalizations as a log function of PM2.5 concentrations in the Bay Area

## 
## Call:
## lm(formula = log(Asthma) ~ PM2.5, data = ces4_bay_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.00402 -0.46479  0.03313  0.42298  1.75525 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.69234    0.22840   3.031  0.00248 ** 
## PM2.5        0.35633    0.02686  13.264  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6566 on 1578 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.1003, Adjusted R-squared:  0.09974 
## F-statistic: 175.9 on 1 and 1578 DF,  p-value: < 2.2e-16

This is a better looking distribution of residuals. The center of the distributions is around 0 and it is more or less symetterically distributed between 2 and -2, with a slight tail in the negative direction. This suggests that a log regression is a much better fit for our data than a linear regression.

A negative residual suggests thatour regression model is over-estimating the rates of asthma given the concentrations of PM2.5 in these areas. The most negative residuals occur in areas directly East of Berkeley and Oakland and in large parts of San Mateo County. Specifically, This suggests that in these areas the rates of asthma are actually lower than we might predict just with PM2.5 data, which could be due to a variety of demographic or socioeeconomic facotrs in these areas. The highest residausl seem to occur in the dense centers of the East Bay and in the outer areas, including south of San Jose and the far Northern and Eastern parts of the Bay Area.